Search CORE

257 research outputs found

Robust Estimation with Discrete Explanatory Variables

Author: FR Hampel
M Hubert
M Orhan
P Čížek
PJ Rousseeuw
PJ Rousseeuw
PJ Rousseeuw
Publication venue: Humboldt-Universität zu Berlin, Wirtschaftswissenschaftliche Fakultät
Publication date: 01/01/2002
Field of study

Crossref

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

Bounded Influence Regression in the Presence of Heteroskedasticity of Unknown Form

Author: CJ Stone
FR Hampel
J Jureckova
PJ Huber
PM Robinson
RJ Carroll
RJ Carroll
VJ Yohai
Publication venue: 'Ovid Technologies (Wolters Kluwer Health)'
Publication date: 01/01/1991
Field of study

In a regression model with conditional heteroskedasticity of unknown form, we propose a general class of M-estimators scaled by nonparametric estimates of the conditional standard deviations of the dependent variable. We give regularity conditions under which these estimators are asymptotically equivalent to M-estimators scaled by the true conditional standard deviations. The practical performance of these estimators is investigated through a Monte Carlo experiment

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Yet another breakdown point notion: EFSBP - illustrated at scale-shape models

Author: A Balkema
A Marazzi
DL Donoho
DL Donoho
E Castillo
FR Hampel
J Pickands
K Boudt
L Peng
MG Genton
ML Eaton
Nataliya Horbenko
P Ruckdeschel
Peter Ruckdeschel
PJ Rousseeuw
PL Davies
PL Davies
V Brazauskas
W Hoeffding
X He
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/06/2011
Field of study

The breakdown point in its different variants is one of the central notions to quantify the global robustness of a procedure. We propose a simple supplementary variant which is useful in situations where we have no obvious or only partial equivariance: Extending the Donoho and Huber(1983) Finite Sample Breakdown Point, we propose the Expected Finite Sample Breakdown Point to produce less configuration-dependent values while still preserving the finite sample aspect of the former definition. We apply this notion for joint estimation of scale and shape (with only scale-equivariance available), exemplified for generalized Pareto, generalized extreme value, Weibull, and Gamma distributions. In these settings, we are interested in highly-robust, easy-to-compute initial estimators; to this end we study Pickands-type and Location-Dispersion-type estimators and compute their respective breakdown points.Comment: 21 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Fraunhofer-ePrints

Private Drinking Water Wells as a Source of Exposure to Perfluorooctanoic Acid (PFOA) in Communities Surrounding a Fluoropolymer Production Facility

Author: Bartell S
Gibson S
Hampel FR
Kate Hoffman
Marc G. Weisskopf
Scott M. Bartell
Thomas F. Webster
Tony Fletcher
U.S. EPA
U.S. EPA
U.S. EPA
U.S. EPA
Verónica M. Vieira
Vieira V
Webster TF
Publication venue: National Institute of Environmental Health Sciences
Publication date: 04/10/2010
Field of study

BACKGROUND: The C8 Health Project was established in 2005 to collect data on perfluorooctanoic acid (PFOA, or C8) and human health in Ohio and West Virginia communities contaminated by a fluoropolymer production facility. OBJECTIVE: We assessed PFOA exposure via contaminated drinking water in a subset of C8 Health Project participants who drank water from private wells. METHODS: Participants provided demographic information and residential, occupational, and medical histories. Laboratory analyses were conducted to determine serum-PFOA concentrations. PFOA data were collected from 2001 through 2005 from 62 private drinking water wells. We examined the relationship between drinking water and PFOA levels in serum using robust regression methods. As a comparison with regression models, we used a first-order, single-compartment pharmacokinetic model to estimate the serum:drinking-water concentration ratio at steady state. RESULTS: The median serum PFOA concentration in 108 study participants who used private wells was 75.7 μg/L, approximately 20 times greater than the levels in the U.S. general population but similar to those of local residents who drank public water. Each 1 μg/L increase in PFOA levels in drinking water was associated with an increase in serum concentrations of 141.5 μg/L (95% confidence interval, 134.9-148.1). The serum:drinking-water concentration ratio for the steady-state pharmacokinetic model was 114. CONCLUSIONS: PFOA-contaminated drinking water is a significant contributor to PFOA levels in serum in the study population. Regression methods and pharmacokinetic modeling produced similar estimates of the relationship

Crossref

LSHTM Research Online

PubMed Central

eScholarship - University of California

Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines

Author: A Burton
A Rouxel
AC Mertens
Andrea Marshall
BL Thomsen
C Serrat
D Collett
DB Rubin
DB Rubin
DB Rubin
DG Altman
Douglas G Altman
DW Hosmer
FE Harrell
FE Harrell
FR Hampel
G Ambler
G Vaughn
HC van Houwelingen
J O'Quigley
JA Hoeting
JC Wyatt
JL Schafer
JW Graham
KH Li
M Schemper
M Schemper
MG Kenward
MW Heymans
N Orsini
O Harel
P Peduzzi
P Royston
Patrick Royston
RA Fisher
Roger L Holder
S Gill
S Sinharay
S van Buuren
T Bärnighausen
TG Clark
TG Clark
WM Stadler
XL Meng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Background: Multiple imputation (MI) provides an effective approach to handle missing covariate data within prognostic modelling studies, as it can properly account for the missing data uncertainty. The multiply imputed datasets are each analysed using standard prognostic modelling techniques to obtain the estimates of interest. The estimates from each imputed dataset are then combined into one overall estimate and variance, incorporating both the within and between imputation variability. Rubin's rules for combining these multiply imputed estimates are based on asymptotic theory. The resulting combined estimates may be more accurate if the posterior distribution of the population parameter of interest is better approximated by the normal distribution. However, the normality assumption may not be appropriate for all the parameters of interest when analysing prognostic modelling studies, such as predicted survival probabilities and model performance measures. Methods: Guidelines for combining the estimates of interest when analysing prognostic modelling studies are provided. A literature review is performed to identify current practice for combining such estimates in prognostic modelling studies. Results: Methods for combining all reported estimates after MI were not well reported in the current literature. Rubin's rules without applying any transformations were the standard approach used, when any method was stated. Conclusion: The proposed simple guidelines for combining estimates after MI may lead to a wider and more appropriate use of MI in future prognostic modelling studies

Crossref

Springer - Publisher Connector

University of Birmingham Research Portal

Directory of Open Access Journals

PubMed Central

UCL Discovery

Warwick Research Archives Portal Repository

Oxford University Research Archive

On the Schoenberg Transformations in Data Analysis: Theory and Illustrations

The class of Schoenberg transformations, embedding Euclidean distances into higher dimensional Euclidean spaces, is presented, and derived from theorems on positive definite and conditionally negative definite matrices. Original results on the arc lengths, angles and curvature of the transformations are proposed, and visualized on artificial data sets by classical multidimensional scaling. A simple distance-based discriminant algorithm illustrates the theory, intimately connected to the Gaussian kernels of Machine Learning

arXiv.org e-Print Archive

Crossref

Serveur académique lausannois

Research Papers in Economics

Assessing Levels of Attention Using Low Cost Eye Tracking

Author: B Laeng
C Varazzani
F Pedregosa
F Pérez
FR Hampel
G Aston-Jones
J Fan
J Hyönä
Jackson Beatty
JD Hunter
JW Peirce
K Holmqvist
MI Posner
S Gabay
S Joshi
S Van Der Walt
TE Oliphant
YS Ang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The emergence of mobile eye trackers embedded in next generation smartphones or VR displays will make it possible to trace not only what objects we look at but also the level of attention in a given situation. Exploring whether we can quantify the engagement of a user interacting with a laptop, we apply mobile eye tracking in an in-depth study over 2 weeks with nearly 10.000 observations to assess pupil size changes, related to attentional aspects of alertness, orientation and conflict resolution. Visually presenting conflicting cues and targets we hypothesize that it's feasible to measure the allocated effort when responding to confusing stimuli. Although such experiments are normally carried out in a lab, we are able to differentiate between sustained alertness and complex decision making even with low cost eye tracking "in the wild". From a quantified self perspective of individual behavioral adaptation, the correlations between the pupil size and the task dependent reaction time and error rates may longer term provide a foundation for modifying smartphone content and interaction to the users perceived level of attention.Comment: 12 pages, 6 figures, 2 tables. The final publication will be available at Springer via http://dx.doi.org/DOIxxx, when published as part of the HCI International 2016 Conference Proceeding

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method

Author: A Agresti
A Agresti
A Silberschatz
AA Freitas
B Efron
B Liu
BG Buchanan
C Silverstein
D Klahr
D Tsur
FR Hampel
Hilderman and Hamilton
MD Gordon
Mir S Siadaty
MJ Zaki
ML Antonie
N Ye
OR Zaïane
P Srinivasan
PN Tan
R Bayardo
R Grossman
S Mitra
S Yoon
V Maojo
William A Knaus
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness), hence leaving a serious bottleneck. In this paper we propose a method for automatically acquiring knowledge to shorten the pattern list by locating the novel and interesting ones. METHODS: The dual-mining method is based on automatically comparing the strength of patterns mined from a database with the strength of equivalent patterns mined from a relevant knowledgebase. When these two estimates of pattern strength do not match, a high "surprise score" is assigned to the pattern, identifying the pattern as potentially interesting. The surprise score captures the degree of novelty or interestingness of the mined pattern. In addition, we show how to compute p values for each surprise score, thus filtering out noise and attaching statistical significance. RESULTS: We have implemented the dual-mining method using scripts written in Perl and R. We applied the method to a large patient database and a biomedical literature citation knowledgebase. The system estimated association scores for 50,000 patterns, composed of disease entities and lab results, by querying the database and the knowledgebase. It then computed the surprise scores by comparing the pairs of association scores. Finally, the system estimated statistical significance of the scores. CONCLUSION: The dual-mining method eliminates more than 90% of patterns with strong associations, thus identifying them as uninteresting. We found that the pruning of patterns using the surprise score matched the biomedical evidence in the 100 cases that were examined by hand. The method automates the acquisition of knowledge, thus reducing dependence on the knowledge elicited from human expert, which is usually a rate-limiting step

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Fish Consumption and Mercury Exposure among Louisiana Recreational Anglers

Author: David B. Senn
Dellenbarger L
Donna J. Vorhees
Edward J. Chesney
Grandjean P
Hampel FR
Institute of Medicine
James P. Shine
NRC (National Research Council)
Philippe Grandjean
R Development Core Team
Rebecca A. Lincoln
U.S. EPA (U.S. Environmental Protection Agency)
U.S. EPA (U.S. Environmental Protection Agency)
U.S. EPA (U.S. Environmental Protection Agency)
U.S. EPA (U.S. Environmental Protection Agency)
U.S. EPA (U.S. Environmental Protection Agency)
Virtanen JK
Publication venue: National Institute of Environmental Health Sciences
Publication date: 01/01/2011
Field of study

Ba c k g r o u n d: Methylmercury (MeHg) exposure assessments among average fish consumers in the United States may underestimate exposures among U.S. subpopulations with high intakes of region-ally specific fish. obj e c t i v e s: We examined relationships among fish consumption, estimated mercury (Hg) intake, and measured Hg exposure within one such potentially highlyexposed group, recreational anglers in the state of Louisiana, USA. Me t h o d s: We surveyed 534 anglers in 2006 using interviews at boat launches and fishing tourna-ments combined with an Internet-based survey method. Hair samples from 402 of these anglers were collected and analyzed for total Hg. Questionnaires provided information on species-specific fish consumption during the 3 months before the survey. re s u l t s: Anglers’ median hairHg concentration was 0.81 μg/g (n = 398; range, 0.02–10.7 μg/g);40% of participants had levels >1 μg/g, which approximately corresponds to the U.S. Environmental Protection Agency’s reference dose. Fish consumption and Hg intake were significantly positively associated with hairHg. Participants reported consuming nearly 80 different fish types, many of which are specific to the region. Unlike the general U.S. population, which acquires most of its Hg from commercial seafood sources, approximately 64% of participants’ fish meals and 74% of their estimated Hg intake came from recreationally caught seafood. co n c l u s i o n s: Study participants had relatively elevated hairHg concentrations and reported con-sumption of a wide variety of fish, particularly locally caught fish. This group represents a highlyexposed subpopulation with an exposure profile that differs from fish consumers in other regions of the United States, suggesting a need for more regionallyspecific exposure estimates and public health advisories.ISSN:1552-9924ISSN:0091-676

Repository for Publications and Research Data

Crossref

PubMed Central

University of Southern Denmark Research Output